Efficient Parameter Estimation of Generalizable Coarse-Grained Protein Force Fields Using Contrastive Divergence: A Maximum Likelihood Approach
نویسندگان
چکیده
Maximum Likelihood (ML) optimization schemes are widely used for parameter inference. They maximize the likelihood of some experimentally observed data, with respect to the model parameters iteratively, following the gradient of the logarithm of the likelihood. Here, we employ a ML inference scheme to infer a generalizable, physics-based coarse-grained protein model (which includes Go̅-like biasing terms to stabilize secondary structure elements in room-temperature simulations), using native conformations of a training set of proteins as the observed data. Contrastive divergence, a novel statistical machine learning technique, is used to efficiently approximate the direction of the gradient ascent, which enables the use of a large training set of proteins. Unlike previous work, the generalizability of the protein model allows the folding of peptides and a protein (protein G) which are not part of the training set. We compare the same force field with different van der Waals (vdW) potential forms: a hard cutoff model, and a Lennard-Jones (LJ) potential with vdW parameters inferred or adopted from the CHARMM or AMBER force fields. Simulations of peptides and protein G show that the LJ model with inferred parameters outperforms the hard cutoff potential, which is consistent with previous observations. Simulations using the LJ potential with inferred vdW parameters also outperforms the protein models with adopted vdW parameter values, demonstrating that model parameters generally cannot be used with force fields with different energy functions. The software is available at https://sites.google.com/site/crankite/.
منابع مشابه
Learning with Blocks: Composite Likelihood and Contrastive Divergence
Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized b...
متن کاملA Bayesian Nominal Regression Model with Random Effects for Analysing Tehran Labor Force Survey Data
Large survey data are often accompanied by sampling weights that reflect the inequality probabilities for selecting samples in complex sampling. Sampling weights act as an expansion factor that, by scaling the subjects, turns the sample into a representative of the community. The quasi-maximum likelihood method is one of the approaches for considering sampling weights in the frequentist framewo...
متن کاملMaximum Likelihood Drift Estimation for Multiscale Diffusions
We study the problem of parameter estimation using maximum likelihood for fast/slow systems of stochastic differential equations. Our aim is to shed light on the problem of model/data mismatch at small scales. We consider two classes of fast/slow problems for which a closed coarse-grained equation for the slow variables can be rigorously derived, which we refer to as averaging and homogenizatio...
متن کاملEvaluation of estimation methods for parameters of the probability functions in tree diameter distribution modeling
One of the most commonly used statistical models for characterizing the variations of tree diameter at breast height is Weibull distribution. The usual approach for estimating parameters of a statistical model is the maximum likelihood estimation (likelihood method). Usually, this works based on iterative algorithms such as Newton-Raphson. However, the efficiency of the likelihood method is not...
متن کاملInformation Geometry of Contrastive Divergence
The contrastive divergence(CD) method proposed by Hinton finds an approximate solution of the maximum likelihood of complex probability models. It is known empirically that the CD method gives a high-quality estimation in a small computation time. In this paper, we give an intuitive explanation about the reason why the CD method can approximate well by using the information geometry. We further...
متن کامل